Replication Code for: "In Safe Hands: The Financial and Real Impact of Investor Composition Over the Credit Cycle"
==============

Antonio Coppola, Stanford University (`acoppola@stanford.edu`)  

I. OVERVIEW OF THE CODE
--------------

This README file describes the overall structure of the replication package for this paper. The uppermost directory of the replication folder contains the following objects:

1. `README.md` (the file you are reading right now)
2. `analysis` (a folder)
3. `build` (a folder)
4. `cgs` (a folder)
5. `clean_trace` (a folder)
6. `docs` (a folder)
7. `insurance` (a folder)
8. `morningstar` (a folder)
9. `scripts` (a folder)
10. `data.zip` (a compressed folder)

The individual jobs are split across the various folders, which should be executed in the following sequence:

1. The code in the `insurance` folder imports the security-level holdings data for US insurance provided by S&P Global from NAIC regulatory filings. It also builds the data to construct consolidated holdings master files for insurers. The primary executable script is `Master_Insurance.sh`, which runs the processes in the `insurance` folder from start to finish, parallelizing the execution using the SLURM Workload Manager. The files named `Master_Insurance.do` and `Insurance_Stata_Controller.sh` are auxiliary support files used together with `Master_Insurance.sh` to orchestrate the jobs' execution, while the remaining files contain the relevant jobs.

2. The code in the `cgs` folder imports and builds reference security- and issuer-level data from S&P CUSIP Global Services (CGS), containing information about global CUSIP-bearing securities and their issuers which is used as part of the Morningstar holdings data build. The main executable file is `Master_CGS.sh`, which executes the jobs via SLURM together with the auxiliary support files `Master_CGS.do` and `CGS_Controller.sh`.

3. The code in the `morningstar` folder imports and executes the build of the Morningstar holdings data for global mutual funds and exchange-traded funds via SLURM. The build consists of a number of steps, with additional details provided in the documentation file `Morningstar_Build_Details.md`. The file `Morningstar_Build_Details.md` also provides details on how to execute the Morningstar build from start to finish.

4. The code in the `build` folder builds the samples that are used for the paper's analysis, including by importing the rest of the raw data sources and merging them with the holdings data built in the `insurance` and `morningstar` sections. The primary executable script is `Master_Build.sh`, which runs all the jobs in the `build` folder together with the auxiliary support files `Master_Build.do`, `Build_Stata_Controller.sh`, and `Build_Python_Controller.sh`. The rest of the files in the folder correspond to the various build jobs.

5. The code in the `analysis` folder runs the analysis, generating the tables and figures in the paper using the output from the `build` section. The primary executable script is `Master_Analysis.sh`, which executes the analysis jobs from start to finish together with the auxiliary support files `Master_Analysis.do`, `Analysis_Stata_Controller.sh`, and `Analysis_Python_Controller.sh`. The rest of the files in the folder correspond to the various analysis jobs.

Additionally, the following folders provide a number of supporting files:

6. The folder `scripts` contains several ancillary scripts and program definitions that are used in the rest of the codebase. It also includes the Stata scheme file `scheme-shplot.scheme`, which should be installed in the user's `ado` folder and specifies the overall look of the graphs produced.

7. The folder `clean_trace` contains SAS scripts provided by the WRDS research team that are used to process the raw TRACE transaction-level database from FINRA, implementing the data cleaning steps in Dick-Nielsen (2014), which are common in the literature. This code is provided for reference, and it can be executed on the WRDS cloud servers (for instance, via the [WRDS SAS Studio](https://wrds-cloud.wharton.upenn.edu/SASStudio)) to produce the clean TRACE files (`raw/trace/trace_*_clean.dta`) that are used in the `build` and `analysis` folders.

8. The folder `docs` contains additional documentation, including a guide to the raw data used and additional technical execution details (see Section III).

9. The compressed folder `data.zip` shows the structure of the raw data folder. In the cases in which they are publicly available, the actual raw files can be found in `data.zip`. For commercial data that requires licensing, the archive includes pseudo-data that illustrates the structure of the files.

II. TECHNICAL NOTES
--------------

  - The code is built for Unix systems and assumes that the Stata, R, and Python interpreters are configured
    on your executable path. The required versions are version 15+ for Stata, version 3.6+ for Python, and version 4.0.2+ for R. Packages may need to be installed using a package manager (e.g. pip) as necessary.

  - The code also assumes that the host system is running the SLURM Workload Manager. The main executable pipelines are written for execution via SLURM, which is used to schedule and parallelize the various jobs.

  - Prior to running the code, users should perform a find-and-replace in the code folder for certain user-specific and system-specific globals. Details are provided in the `docs/globals_guide.md` file.

III. ADDITIONAL RESOURCES
--------------

The following text files in the `docs` folder provide further documentation:

1. The file `data_guide.md` provides a list of the raw input files used, together with short descriptions of the data.
2. The file `globals_guide.md` provides a list of the globals that need to be specified before running the code.
